ˆ Why language modeling is critical to addressing tasks in natural language processing.

Size: px
Start display at page:

Download "ˆ Why language modeling is critical to addressing tasks in natural language processing."

Transcription

1 Chapter 17 Neural Language Modeling Language modeling is central to many important natural language processing tasks. Recently, neural-network-based language models have demonstrated better performance than classical methods both standalone and as part of more challenging natural language processing tasks. In this chapter, you will discover language modeling for natural language processing. After reading this chapter, you will know: ˆ Why language modeling is critical to addressing tasks in natural language processing. ˆ What a language model is and some examples of where they are used. ˆ How neural networks can be used for language modeling. Let s get started Overview This tutorial is divided into the following parts: 1. Problem of Modeling Language 2. Statistical Language Modeling 3. Neural Language Models 17.2 Problem of Modeling Language Formal languages, like programming languages, can be fully specified. All the reserved words can be defined and the valid ways that they can be used can be precisely defined. We cannot do this with natural language. Natural languages are not designed; they emerge, and therefore there is no formal specification. There may be formal rules and heuristics for parts of the language, but as soon as rules are defined, you will devise or encounter counter examples that contradict the rules. Natural languages involve vast numbers of terms that can be used in ways that introduce all kinds of ambiguities, yet can still be understood by other humans. Further, languages change, word 190

2 17.3. Statistical Language Modeling 191 usages change: it is a moving target. Nevertheless, linguists try to specify the language with formal grammars and structures. It can be done, but it is very difficult and the results can be fragile. An alternative approach to specifying the model of the language is to learn it from examples Statistical Language Modeling Statistical Language Modeling, or Language Modeling and LM for short, is the development of probabilistic models that are able to predict the next word in the sequence given the words that precede it. Language modeling is the task of assigning a probability to sentences in a language. [...] Besides assigning a probability to each sequence of words, the language models also assigns a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words Page 105, Neural Network Methods in Natural Language Processing, A language model learns the probability of word occurrence based on examples of text. Simpler models may look at a context of a short sequence of words, whereas larger models may work at the level of sentences or paragraphs. Most commonly, language models operate at the level of words. The notion of a language model is inherently probabilistic. A language model is a function that puts a probability measure over strings drawn from some vocabulary. Page 238, An Introduction to Information Retrieval, A language model can be developed and used standalone, such as to generate new sequences of text that appear to have come from the corpus. Language modeling is a root problem for a large range of natural language processing tasks. More practically, language models are used on the front-end or back-end of a more sophisticated model for a task that requires language understanding.... language modeling is a crucial component in real-world applications such as machine-translation and automatic speech recognition, [...] For these reasons, language modeling plays a central role in natural-language processing, AI, and machinelearning research. Page 105, Neural Network Methods in Natural Language Processing, A good example is speech recognition, where audio data is used as an input to the model and the output requires a language model that interprets the input signal and recognizes each new word within the context of the words already recognized. Speech recognition is principally concerned with the problem of transcribing the speech signal as a sequence of words. [...] From this point of view, speech is assumed to be a generated by a language model which provides estimates of Pr(w) for all word strings w independently of the observed signal [...] The goal of speech recognition is to find the most likely word sequence given the observed acoustic signal.

3 17.4. Neural Language Models 192 Pages , The Oxford Handbook of Computational Linguistics, 2005 Similarly, language models are used to generate text in many similar natural language processing tasks, for example: ˆ Optical Character Recognition ˆ Handwriting Recognition. ˆ Machine Translation. ˆ Spelling Correction. ˆ Image Captioning. ˆ Text Summarization ˆ And much more. Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction A Bit of Progress in Language Modeling, Developing better language models often results in models that perform better on their intended natural language processing task. This is the motivation for developing better and more accurate language models. [language models] have played a key role in traditional NLP tasks such as speech recognition, machine translation, or text summarization. Often (although not always), training better language models improves the underlying metrics of the downstream task (such as word error rate for speech recognition, or BLEU score for translation), which makes the task of training better LMs valuable by itself. Exploring the Limits of Language Modeling, Neural Language Models Recently, the use of neural networks in the development of language models has become very popular, to the point that it may now be the preferred approach. The use of neural networks in language modeling is often called Neural Language Modeling, or NLM for short. Neural network approaches are achieving better results than classical methods both on standalone language models and when models are incorporated into larger models on challenging tasks like speech recognition and machine translation. A key reason for the leaps in improved performance may be the method s ability to generalize.

4 17.4. Neural Language Models 193 Nonlinear neural network models solve some of the shortcomings of traditional language models: they allow conditioning on increasingly large context sizes with only a linear increase in the number of parameters, they alleviate the need for manually designing backoff orders, and they support generalization across different contexts. Page 109, Neural Network Methods in Natural Language Processing, Specifically, a word embedding is adopted that uses a real-valued vector to represent each word in a projected vector space. This learned representation of words based on their usage allows words with a similar meaning to have a similar representation. Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors (word embeddings) and using them as inputs to a neural network. The parameters are learned as part of the training process. Word embeddings obtained through NLMs exhibit the property whereby semantically close words are likewise close in the induced vector space. Character-Aware Neural Language Model, This generalization is something that the representation used in classical statistical language models cannot easily achieve. True generalization is difficult to obtain in a discrete word indice space, since there is no obvious relation between the word indices. Connectionist language modeling for large vocabulary continuous speech recognition, Further, the distributed representation approach allows the embedding representation to scale better with the size of the vocabulary. Classical methods that have one discrete representation per word fight the curse of dimensionality with larger and larger vocabularies of words that result in longer and more sparse representations. The neural network approach to language modeling can be described using the three following model properties, taken from A Neural Probabilistic Language Model, Associate each word in the vocabulary with a distributed word feature vector. 2. Express the joint probability function of word sequences in terms of the feature vectors of these words in the sequence. 3. Learn simultaneously the word feature vector and the parameters of the probability function. This represents a relatively simple model where both the representation and probabilistic model are learned together directly from raw text data. Recently, the neural based approaches have started to outperform the classical statistical approaches. We provide ample empirical evidence to suggest that connectionist language models are superior to standard n-gram techniques, except their high computational (training) complexity.

5 17.5. Further Reading 194 Recurrent neural network based language model, Initially, feedforward neural network models were used to introduce the approach. More recently, recurrent neural networks and then networks with a long-term memory like the Long Short-Term Memory network, or LSTM, allow the models to learn the relevant context over much longer input sequences than the simpler feedforward networks. [an RNN language model] provides further generalization: instead of considering just several preceding words, neurons with input from recurrent connections are assumed to represent short term memory. The model learns itself from the data how to represent memory. While shallow feedforward neural networks (those with just one hidden layer) can only cluster similar words, recurrent neural network (which can be considered as a deep architecture) can perform clustering of similar histories. This allows for instance efficient representation of patterns with variable length. Extensions of recurrent neural network language model, Recently, researchers have been seeking the limits of these language models. In the paper Exploring the Limits of Language Modeling, evaluating language models over large datasets, such as the corpus of one million words, the authors find that LSTM-based neural language models out-perform the classical methods.... we have shown that RNN LMs can be trained on large amounts of data, and outperform competing models including carefully tuned N-grams. Exploring the Limits of Language Modeling, Further, they propose some heuristics for developing high-performing neural language models in general: ˆ Size matters. The best models were the largest models, specifically number of memory units. ˆ Regularization matters. improves results. Use of regularization like dropout on input connections ˆ CNNs vs Embeddings. Character-level Convolutional Neural Network (CNN) models can be used on the front-end instead of word embeddings, achieving similar and sometimes better results. ˆ Ensembles matter. Combining the prediction from multiple models can offer large improvements in model performance Further Reading This section provides more resources on the topic if you are looking go deeper.

6 17.5. Further Reading Books ˆ Neural Network Methods in Natural Language Processing, ˆ Natural Language Processing, Artificial Intelligence A Modern Approach, ˆ Language models for information retrieval, An Introduction to Information Retrieval, Papers ˆ A Neural Probabilistic Language Model, NIPS, ˆ A Neural Probabilistic Language Model, JMLR, ˆ Connectionist language modeling for large vocabulary continuous speech recognition, pdf ˆ Recurrent neural network based language model, IS pdf ˆ Extensions of recurrent neural network language model, ˆ Character-Aware Neural Language Model, ˆ LSTM Neural Networks for Language Modeling, pdf ˆ Exploring the Limits of Language Modeling, Articles ˆ Language Model, Wikipedia. ˆ Neural net language models, Scholarpedia.

7 17.6. Summary Summary In this chapter, you discovered language modeling for natural language processing tasks. Specifically, you learned: ˆ That natural language is not formally specified and requires the use of statistical models to learn from examples. ˆ That statistical language models are central to many challenging natural language processing tasks. ˆ That state-of-the-art results are achieved using neural language models, specifically those with word embeddings and recurrent neural network algorithms Next In the next chapter, you will discover how you can develop a character-based neural language model.

8 Chapter 18 How to Develop a Character-Based Neural Language Model A language model predicts the next word in the sequence based on the specific words that have come before it in the sequence. It is also possible to develop language models at the character level using neural networks. The benefit of character-based language models is their small vocabulary and flexibility in handling any words, punctuation, and other document structure. This comes at the cost of requiring larger models that are slower to train. Nevertheless, in the field of neural language models, character-based models offer a lot of promise for a general, flexible and powerful approach to language modeling. In this tutorial, you will discover how to develop a character-based neural language model. After completing this tutorial, you will know: ˆ How to prepare text for character-based language modeling. ˆ How to develop a character-based language model using LSTMs. ˆ How to use a trained character-based language model to generate text. Let s get started Tutorial Overview This tutorial is divided into the following parts: 1. Sing a Song of Sixpence 2. Data Preparation 3. Train Language Model 4. Generate Text 197

9 18.2. Sing a Song of Sixpence Sing a Song of Sixpence The nursery rhyme Sing a Song of Sixpence is well known in the west. The first verse is common, but there is also a 4 verse version that we will use to develop our character-based language model. It is short, so fitting the model will be fast, but not so short that we won t see anything interesting. The complete 4 verse version we will use as source text is listed below. Sing a song of sixpence, A pocket full of rye. Four and twenty blackbirds, Baked in a pie. When the pie was opened The birds began to sing; Wasn't that a dainty dish, To set before the king. The king was in his counting house, Counting out his money; The queen was in the parlour, Eating bread and honey. The maid was in the garden, Hanging out the clothes, When down came a blackbird And pecked off her nose. Listing 18.1: Sing a Song of Sixpence nursery rhyme. Copy the text and save it in a new file in your current working directory with the file name rhyme.txt Data Preparation The first step is to prepare the text data. We will start by defining the type of language model Language Model Design A language model must be trained on the text, and in the case of a character-based language model, the input and output sequences must be characters. The number of characters used as input will also define the number of characters that will need to be provided to the model in order to elicit the first predicted character. After the first character has been generated, it can be appended to the input sequence and used as input for the model to generate the next character. Longer sequences offer more context for the model to learn what character to output next but take longer to train and impose more burden on seeding the model when generating text. We will use an arbitrary length of 10 characters for this model. There is not a lot of text, and 10 characters is a few words. We can now transform the raw text into a form that our model can learn; specifically, input and output sequences of characters.

10 18.3. Data Preparation Load Text We must load the text into memory so that we can work with it. Below is a function named load doc() that will load a text file given a filename and return the loaded text. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text Listing 18.2: Function to load a document into memory. We can call this function with the filename of the nursery rhyme rhyme.txt to load the text into memory. The contents of the file are then printed to screen as a sanity check. # load text raw_text = load_doc('rhyme.txt') print(raw_text) Listing 18.3: Load the document into memory Clean Text Next, we need to clean the loaded text. We will not do much to it on this example. Specifically, we will strip all of the new line characters so that we have one long sequence of characters separated only by white space. # clean tokens = raw_text.split() raw_text = ' '.join(tokens) Listing 18.4: Tokenize the loaded document. You may want to explore other methods for data cleaning, such as normalizing the case to lowercase or removing punctuation in an effort to reduce the final vocabulary size and develop a smaller and leaner model Create Sequences Now that we have a long list of characters, we can create our input-output sequences used to train the model. Each input sequence will be 10 characters with one output character, making each sequence 11 characters long. We can create the sequences by enumerating the characters in the text, starting at the 11th character at index 10. # organize into sequences of characters length = 10 sequences = list() for i in range(length, len(raw_text)): # select sequence of tokens seq = raw_text[i-length:i+1]

11 18.3. Data Preparation 200 # store sequences.append(seq) print('total Sequences: %d' % len(sequences)) Listing 18.5: Convert text into fixed-length sequences. Running this snippet, we can see that we end up with just under 400 sequences of characters for training our language model. Total Sequences: 399 Listing 18.6: Example output of converting text into fixed-length sequences Save Sequences Finally, we can save the prepared data to file so that we can load it later when we develop our model. Below is a function save doc() that, given a list of strings and a filename, will save the strings to file, one per line. # save tokens to file, one dialog per line def save_doc(lines, filename): data = '\n'.join(lines) file = open(filename, 'w') file.write(data) file.close() Listing 18.7: Function to save sequences to file. We can call this function and save our prepared sequences to the filename char sequences.txt in our current working directory. # save sequences to file out_filename = 'char_sequences.txt' save_doc(sequences, out_filename) Listing 18.8: Call function to save sequences to file Complete Example Tying all of this together, the complete code listing is provided below. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # save tokens to file, one dialog per line def save_doc(lines, filename): data = '\n'.join(lines) file = open(filename, 'w')

12 18.4. Train Language Model 201 file.write(data) file.close() # load text raw_text = load_doc('rhyme.txt') print(raw_text) # clean tokens = raw_text.split() raw_text = ' '.join(tokens) # organize into sequences of characters length = 10 sequences = list() for i in range(length, len(raw_text)): # select sequence of tokens seq = raw_text[i-length:i+1] # store sequences.append(seq) print('total Sequences: %d' % len(sequences)) # save sequences to file out_filename = 'char_sequences.txt' save_doc(sequences, out_filename) Listing 18.9: Complete example of preparing the text data. Run the example to create the char sequences.txt file. Take a look inside you should see something like the following: Sing a song ing a song ng a song o g a song of a song of a song of s song of si song of six ong of sixp ng of sixpe... Listing 18.10: Sample of the output file. We are now ready to train our character-based neural language model Train Language Model In this section, we will develop a neural language model for the prepared sequence data. The model will read encoded characters and predict the next character in the sequence. A Long Short-Term Memory recurrent neural network hidden layer will be used to learn the context from the input sequence in order to make the predictions Load Data The first step is to load the prepared character sequence data from char sequences.txt. We can use the same load doc() function developed in the previous section. Once loaded, we split

13 18.4. Train Language Model 202 the text by new line to give a list of sequences ready to be encoded. # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # load in_filename = 'char_sequences.txt' raw_text = load_doc(in_filename) lines = raw_text.split('\n') Listing 18.11: Load the prepared text data Encode Sequences The sequences of characters must be encoded as integers. This means that each unique character will be assigned a specific integer value and each sequence of characters will be encoded as a sequence of integers. We can create the mapping given a sorted set of unique characters in the raw input data. The mapping is a dictionary of character values to integer values. chars = sorted(list(set(raw_text))) mapping = dict((c, i) for i, c in enumerate(chars)) Listing 18.12: Create a mapping between chars and integers. Next, we can process each sequence of characters one at a time and use the dictionary mapping to look up the integer value for each character. sequences = list() for line in lines: # integer encode line encoded_seq = [mapping[char] for char in line] # store sequences.append(encoded_seq) Listing 18.13: Integer encode sequences of characters. The result is a list of integer lists. We need to know the size of the vocabulary later. We can retrieve this as the size of the dictionary mapping. # vocabulary size vocab_size = len(mapping) print('vocabulary Size: %d' % vocab_size) Listing 18.14: Summarize the size of the vocabulary. Running this piece, we can see that there are 38 unique characters in the input sequence data. Vocabulary Size: 38 Listing 18.15: Example output from summarizing the size of the vocabulary.

14 18.4. Train Language Model Split Inputs and Output Now that the sequences have been integer encoded, we can separate the columns into input and output sequences of characters. We can do this using a simple array slice. sequences = array(sequences) X, y = sequences[:,:-1], sequences[:,-1] Listing 18.16: Split sequences into input and output elements. Next, we need to one hot encode each character. That is, each character becomes a vector as long as the vocabulary (38 elements) with a 1 marked for the specific character. This provides a more precise input representation for the network. It also provides a clear objective for the network to predict, where a probability distribution over characters can be output by the model and compared to the ideal case of all 0 values with a 1 for the actual next character. We can use the to categorical() function in the Keras API to one hot encode the input and output sequences. sequences = [to_categorical(x, num_classes=vocab_size) for x in X] X = array(sequences) y = to_categorical(y, num_classes=vocab_size) Listing 18.17: Convert sequences into a format ready for training. We are now ready to fit the model Fit Model The model is defined with an input layer that takes sequences that have 10 time steps and 38 features for the one hot encoded input sequences. Rather than specify these numbers, we use the second and third dimensions on the X input data. This is so that if we change the length of the sequences or size of the vocabulary, we do not need to change the model definition. The model has a single LSTM hidden layer with 75 memory cells, chosen with a little trial and error. The model has a fully connected output layer that outputs one vector with a probability distribution across all characters in the vocabulary. A softmax activation function is used on the output layer to ensure the output has the properties of a probability distribution. # define the model def define_model(x): model = Sequential() model.add(lstm(75, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(vocab_size, activation='softmax')) # compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model Listing 18.18: Define the language model. The model is learning a multiclass classification problem, therefore we use the categorical log loss intended for this type of problem. The efficient Adam implementation of gradient descent is used to optimize the model and accuracy is reported at the end of each batch update. The

15 18.4. Train Language Model 204 model is fit for 100 training epochs, again found with a little trial and error. Running this prints a summary of the defined network as a sanity check. Layer (type) Output Shape Param # ================================================================= lstm_1 (LSTM) (None, 75) dense_1 (Dense) (None, 38) 2888 ================================================================= Total params: 37,088 Trainable params: 37,088 Non-trainable params: 0 Listing 18.19: Example output from summarizing the defined model. A plot the defined model is then saved to file with the name model.png. Figure 18.1: Plot of the defined character-based language model Save Model After the model is fit, we save it to file for later use. The Keras model API provides the save() function that we can use to save the model to a single file, including weights and topology information. # save the model to file model.save('model.h5') Listing 18.20: Save the fit model to file. We also save the mapping from characters to integers that we will need to encode any input when using the model and decode any output from the model. # save the mapping dump(mapping, open('mapping.pkl', 'wb')) Listing 18.21: Save the mapping of chars to integers to file.

16 18.4. Train Language Model Complete Example Tying all of this together, the complete code listing for fitting the character-based neural language model is listed below. from numpy import array from pickle import dump from keras.utils import to_categorical from keras.utils.vis_utils import plot_model from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM # load doc into memory def load_doc(filename): # open the file as read only file = open(filename, 'r') # read all text text = file.read() # close the file file.close() return text # define the model def define_model(x): model = Sequential() model.add(lstm(75, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(vocab_size, activation='softmax')) # compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model # load in_filename = 'char_sequences.txt' raw_text = load_doc(in_filename) lines = raw_text.split('\n') # integer encode sequences of characters chars = sorted(list(set(raw_text))) mapping = dict((c, i) for i, c in enumerate(chars)) sequences = list() for line in lines: # integer encode line encoded_seq = [mapping[char] for char in line] # store sequences.append(encoded_seq) # vocabulary size vocab_size = len(mapping) print('vocabulary Size: %d' % vocab_size) # separate into input and output sequences = array(sequences) X, y = sequences[:,:-1], sequences[:,-1] sequences = [to_categorical(x, num_classes=vocab_size) for x in X] X = array(sequences)

17 18.5. Generate Text 206 y = to_categorical(y, num_classes=vocab_size) # define model model = define_model(x) # fit model model.fit(x, y, epochs=100, verbose=2) # save the model to file model.save('model.h5') # save the mapping dump(mapping, open('mapping.pkl', 'wb')) Listing 18.22: Complete example of training the language model. Running the example might take one minute. You will see that the model learns the problem well, perhaps too well for generating surprising sequences of characters.... Epoch 96/100 0s - loss: acc: Epoch 97/100 0s - loss: acc: Epoch 98/100 0s - loss: acc: Epoch 99/100 0s - loss: acc: Epoch 100/100 0s - loss: acc: Listing 18.23: Example output from training the language model. At the end of the run, you will have two files saved to the current working directory, specifically model.h5 and mapping.pkl. Next, we can look at using the learned model Generate Text We will use the learned language model to generate new sequences of text that have the same statistical properties Load Model The first step is to load the model saved to the file model.h5. We can use the load model() function from the Keras API. # load the model model = load_model('model.h5') Listing 18.24: Load the saved model. We also need to load the pickled dictionary for mapping characters to integers from the file mapping.pkl. We will use the Pickle API to load the object. # load the mapping mapping = load(open('mapping.pkl', 'rb')) Listing 18.25: Load the saved mapping from chars to integers. We are now ready to use the loaded model.

18 18.5. Generate Text Generate Characters We must provide sequences of 10 characters as input to the model in order to start the generation process. We will pick these manually. A given input sequence will need to be prepared in the same way as preparing the training data for the model. First, the sequence of characters must be integer encoded using the loaded mapping. # encode the characters as integers encoded = [mapping[char] for char in in_text] Listing 18.26: Encode input text to integers. Next, the integers need to be one hot encoded using the to categorical() Keras function. We also need to reshape the sequence to be 3-dimensional, as we only have one sequence and LSTMs require all input to be three dimensional (samples, time steps, features). # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) Listing 18.27: One hot encode the integer encoded text. We can then use the model to predict the next character in the sequence. We use predict classes() instead of predict() to directly select the integer for the character with the highest probability instead of getting the full probability distribution across the entire set of characters. # predict character yhat = model.predict_classes(encoded, verbose=0) Listing 18.28: Predict the next character in the sequence. We can then decode this integer by looking up the mapping to see the character to which it maps. out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break Listing 18.29: Map the predicted integer back to a character. This character can then be added to the input sequence. We then need to make sure that the input sequence is 10 characters by truncating the first character from the input sequence text. We can use the pad sequences() function from the Keras API that can perform this truncation operation. Putting all of this together, we can define a new function named generate seq() for using the loaded model to generate new sequences of text. # generate a sequence of characters with a language model def generate_seq(model, mapping, seq_length, seed_text, n_chars): in_text = seed_text # generate a fixed number of characters for _ in range(n_chars): # encode the characters as integers encoded = [mapping[char] for char in in_text] # truncate sequences to a fixed length

19 18.5. Generate Text 208 encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre') # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) # predict character yhat = model.predict_classes(encoded, verbose=0) # reverse map integer to character out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break # append to input in_text += char return in_text Listing 18.30: Function to predict a sequence of characters given seed text Complete Example Tying all of this together, the complete example for generating text using the fit neural language model is listed below. from pickle import load from keras.models import load_model from keras.utils import to_categorical from keras.preprocessing.sequence import pad_sequences # generate a sequence of characters with a language model def generate_seq(model, mapping, seq_length, seed_text, n_chars): in_text = seed_text # generate a fixed number of characters for _ in range(n_chars): # encode the characters as integers encoded = [mapping[char] for char in in_text] # truncate sequences to a fixed length encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre') # one hot encode encoded = to_categorical(encoded, num_classes=len(mapping)) encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) # predict character yhat = model.predict_classes(encoded, verbose=0) # reverse map integer to character out_char = '' for char, index in mapping.items(): if index == yhat: out_char = char break # append to input in_text += out_char return in_text # load the model model = load_model('model.h5') # load the mapping

20 18.6. Further Reading 209 mapping = load(open('mapping.pkl', 'rb')) # test start of rhyme print(generate_seq(model, mapping, 10, 'Sing a son', 20)) # test mid-line print(generate_seq(model, mapping, 10, 'king was i', 20)) # test not in original print(generate_seq(model, mapping, 10, 'hello worl', 20)) Listing 18.31: Complete example of generating characters with the fit model. Running the example generates three sequences of text. The first is a test to see how the model does at starting from the beginning of the rhyme. The second is a test to see how well it does at beginning in the middle of a line. The final example is a test to see how well it does with a sequence of characters never seen before. Note: Given the stochastic nature of neural networks, your specific results may vary. Consider running the example a few times. Sing a song of sixpence, A poc king was in his counting house hello worls e pake wofey. The Listing 18.32: Example output from generating sequences of characters. We can see that the model did very well with the first two examples, as we would expect. We can also see that the model still generated something for the new text, but it is nonsense Further Reading This section provides more resources on the topic if you are looking go deeper. ˆ Sing a Song of Sixpence on Wikipedia. ˆ Keras Utils API. ˆ Keras Sequence Processing API Summary In this tutorial, you discovered how to develop a character-based neural language model. Specifically, you learned: ˆ How to prepare text for character-based language modeling. ˆ How to develop a character-based language model using LSTMs. ˆ How to use a trained character-based language model to generate text.

21 18.7. Summary Next In the next chapter, you will discover how you can develop a word-based neural language model.

22 Chapter 19 How to Develop a Word-Based Neural Language Model Language modeling involves predicting the next word in a sequence given the sequence of words already present. A language model is a key element in many natural language processing models such as machine translation and speech recognition. The choice of how the language model is framed must match how the language model is intended to be used. In this tutorial, you will discover how the framing of a language model affects the skill of the model when generating short sequences from a nursery rhyme. After completing this tutorial, you will know: ˆ The challenge of developing a good framing of a word-based language model for a given application. ˆ How to develop one-word, two-word, and line-based framings for word-based language models. ˆ How to generate sequences using a fit language model. Let s get started Tutorial Overview This tutorial is divided into the following parts: 1. Framing Language Modeling 2. Jack and Jill Nursery Rhyme 3. Model 1: One-Word-In, One-Word-Out Sequences 4. Model 2: Line-by-Line Sequence 5. Model 3: Two-Words-In, One-Word-Out Sequence 211

23 19.2. Framing Language Modeling Framing Language Modeling A statistical language model is learned from raw text and predicts the probability of the next word in the sequence given the words already present in the sequence. Language models are a key component in larger models for challenging natural language processing problems, like machine translation and speech recognition. They can also be developed as standalone models and used for generating new sequences that have the same statistical properties as the source text. Language models both learn and predict one word at a time. The training of the network involves providing sequences of words as input that are processed one at a time where a prediction can be made and learned for each input sequence. Similarly, when making predictions, the process can be seeded with one or a few words, then predicted words can be gathered and presented as input on subsequent predictions in order to build up a generated output sequence Therefore, each model will involve splitting the source text into input and output sequences, such that the model can learn to predict words. There are many ways to frame the sequences from a source text for language modeling. In this tutorial, we will explore 3 different ways of developing word-based language models in the Keras deep learning library. There is no single best approach, just different framings that may suit different applications Jack and Jill Nursery Rhyme Jack and Jill is a simple nursery rhyme. It is comprised of 4 lines, as follows: Jack and Jill went up the hill To fetch a pail of water Jack fell down and broke his crown And Jill came tumbling after Listing 19.1: Jack and Jill nursery rhyme. We will use this as our source text for exploring different framings of a word-based language model. We can define this text in Python as follows: # source text data = """ Jack and Jill went up the hill\n To fetch a pail of water\n Jack fell down and broke his crown\n And Jill came tumbling after\n """ Listing 19.2: Sample text for this tutorial Model 1: One-Word-In, One-Word-Out Sequences We can start with a very simple model. Given one word as input, the model will learn to predict the next word in the sequence. For example: X, y Jack, and and, Jill Jill, went

24 19.4. Model 1: One-Word-In, One-Word-Out Sequences Listing 19.3: Example of input and output pairs. The first step is to encode the text as integers. Each lowercase word in the source text is assigned a unique integer and we can convert the sequences of words to sequences of integers. Keras provides the Tokenizer class that can be used to perform this encoding. First, the Tokenizer is fit on the source text to develop the mapping from words to unique integers. Then sequences of text can be converted to sequences of integers by calling the texts to sequences() function. # integer encode text tokenizer = Tokenizer() tokenizer.fit_on_texts([data]) encoded = tokenizer.texts_to_sequences([data])[0] Listing 19.4: Example of training a Tokenizer on the sample text. We will need to know the size of the vocabulary later for both defining the word embedding layer in the model, and for encoding output words using a one hot encoding. The size of the vocabulary can be retrieved from the trained Tokenizer by accessing the word index attribute. # determine the vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary Size: %d' % vocab_size) Listing 19.5: Summarize the size of the vocabulary. Running this example, we can see that the size of the vocabulary is 21 words. We add one, because we will need to specify the integer for the largest encoded word as an array index, e.g. words encoded 1 to 21 with array indicies 0 to 21 or 22 positions. Next, we need to create sequences of words to fit the model with one word as input and one word as output. # create word -> word sequences sequences = list() for i in range(1, len(encoded)): sequence = encoded[i-1:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) Listing 19.6: Example of encoding the source text. Running this piece shows that we have a total of 24 input-output pairs to train the network. Total Sequences: 24 Listing 19.7: Example of output of summarizing the encoded text. We can then split the sequences into input (X) and output elements (y). This is straightforward as we only have two columns in the data. # split into X and y elements sequences = array(sequences) X, y = sequences[:,0],sequences[:,1] Listing 19.8: Split the encoded text into input and output pairs.

25 19.4. Model 1: One-Word-In, One-Word-Out Sequences 214 We will fit our model to predict a probability distribution across all words in the vocabulary. That means that we need to turn the output element from a single integer into a one hot encoding with a 0 for every word in the vocabulary and a 1 for the actual word that the value. This gives the network a ground truth to aim for from which we can calculate error and update the model. Keras provides the to categorical() function that we can use to convert the integer to a one hot encoding while specifying the number of classes as the vocabulary size. # one hot encode outputs y = to_categorical(y, num_classes=vocab_size) Listing 19.9: One hot encode the output words. We are now ready to define the neural network model. The model uses a learned word embedding in the input layer. This has one real-valued vector for each word in the vocabulary, where each word vector has a specified length. In this case we will use a 10-dimensional projection. The input sequence contains a single word, therefore the input length=1. The model has a single hidden LSTM layer with 50 units. This is far more than is needed. The output layer is comprised of one neuron for each word in the vocabulary and uses a softmax activation function to ensure the output is normalized to look like a probability. # define the model def define_model(vocab_size): model = Sequential() model.add(embedding(vocab_size, 10, input_length=1)) model.add(lstm(50)) model.add(dense(vocab_size, activation='softmax')) # compile network model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model Listing 19.10: Define and compile the language model. The structure of the network can be summarized as follows: Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 1, 10) 220 lstm_1 (LSTM) (None, 50) dense_1 (Dense) (None, 22) 1122 ================================================================= Total params: 13,542 Trainable params: 13,542 Non-trainable params: 0 Listing 19.11: Example output summarizing the defined model. A plot the defined model is then saved to file with the name model.png.

26 19.4. Model 1: One-Word-In, One-Word-Out Sequences 215 Figure 19.1: Plot of the defined word-based language model. We will use this same general network structure for each example in this tutorial, with minor changes to the learned embedding layer. We can compile and fit the network on the encoded text data. Technically, we are modeling a multiclass classification problem (predict the word in the vocabulary), therefore using the categorical cross entropy loss function. We use the efficient Adam implementation of gradient descent and track accuracy at the end of each epoch. The model is fit for 500 training epochs, again, perhaps more than is needed. The network configuration was not tuned for this and later experiments; an over-prescribed configuration was chosen to ensure that we could focus on the framing of the language model. After the model is fit, we test it by passing it a given word from the vocabulary and having the model predict the next word. Here we pass in Jack by encoding it and calling model.predict classes() to get the integer output for the predicted word. This is then looked up in the vocabulary mapping to give the associated word. # evaluate in_text = 'Jack' print(in_text) encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) yhat = model.predict_classes(encoded, verbose=0) for word, index in tokenizer.word_index.items(): if index == yhat: print(word) Listing 19.12: Evaluate the fit language model. This process could then be repeated a few times to build up a generated sequence of words. To make this easier, we wrap up the behavior in a function that we can call by passing in our model and the seed word. # generate a sequence from the model def generate_seq(model, tokenizer, seed_text, n_words):

27 19.4. Model 1: One-Word-In, One-Word-Out Sequences 216 in_text, result = seed_text, seed_text # generate a fixed number of words for _ in range(n_words): # encode the text as integer encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) # predict a word in the vocabulary yhat = model.predict_classes(encoded, verbose=0) # map predicted word index to word out_word = '' for word, index in tokenizer.word_index.items(): if index == yhat: out_word = word break # append to input in_text, result = out_word, result + ' ' + out_word return result Listing 19.13: Function to generate output sequences given a fit model. We can tie all of this together. The complete code listing is provided below. from numpy import array from keras.preprocessing.text import Tokenizer from keras.utils import to_categorical from keras.utils.vis_utils import plot_model from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import Embedding # generate a sequence from the model def generate_seq(model, tokenizer, seed_text, n_words): in_text, result = seed_text, seed_text # generate a fixed number of words for _ in range(n_words): # encode the text as integer encoded = tokenizer.texts_to_sequences([in_text])[0] encoded = array(encoded) # predict a word in the vocabulary yhat = model.predict_classes(encoded, verbose=0) # map predicted word index to word out_word = '' for word, index in tokenizer.word_index.items(): if index == yhat: out_word = word break # append to input in_text, result = out_word, result + ' ' + out_word return result # define the model def define_model(vocab_size): model = Sequential() model.add(embedding(vocab_size, 10, input_length=1)) model.add(lstm(50)) model.add(dense(vocab_size, activation='softmax'))

28 19.4. Model 1: One-Word-In, One-Word-Out Sequences 217 # compile network model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # summarize defined model model.summary() plot_model(model, to_file='model.png', show_shapes=true) return model # source text data = """ Jack and Jill went up the hill\n To fetch a pail of water\n Jack fell down and broke his crown\n And Jill came tumbling after\n """ # integer encode text tokenizer = Tokenizer() tokenizer.fit_on_texts([data]) encoded = tokenizer.texts_to_sequences([data])[0] # determine the vocabulary size vocab_size = len(tokenizer.word_index) + 1 print('vocabulary Size: %d' % vocab_size) # create word -> word sequences sequences = list() for i in range(1, len(encoded)): sequence = encoded[i-1:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) # split into X and y elements sequences = array(sequences) X, y = sequences[:,0],sequences[:,1] # one hot encode outputs y = to_categorical(y, num_classes=vocab_size) # define model model = define_model(vocab_size) # fit network model.fit(x, y, epochs=500, verbose=2) # evaluate print(generate_seq(model, tokenizer, 'Jack', 6)) Listing 19.14: Complete example of model1. Running the example prints the loss and accuracy each training epoch.... Epoch 496/500 0s - loss: acc: Epoch 497/500 0s - loss: acc: Epoch 498/500 0s - loss: acc: Epoch 499/500 0s - loss: acc: Epoch 500/500 0s - loss: acc: Listing 19.15: Example output of fitting the language model. We can see that the model does not memorize the source sequences, likely because there is some ambiguity in the input sequences, for example:

29 19.5. Model 2: Line-by-Line Sequence 218 jack => and jack => fell Listing 19.16: Example output of predicting the next word. And so on. At the end of the run, Jack is passed in and a prediction or new sequence is generated. We get a reasonable sequence as output that has some elements of the source. Note: Given the stochastic nature of neural networks, your specific results may vary. Consider running the example a few times. Jack and jill came tumbling after down Listing 19.17: Example output of predicting a sequence of words. This is a good first cut language model, but does not take full advantage of the LSTM s ability to handle sequences of input and disambiguate some of the ambiguous pairwise sequences by using a broader context Model 2: Line-by-Line Sequence Another approach is to split up the source text line-by-line, then break each line down into a series of words that build up. For example: X, y _, _, _, _, _, Jack, and _, _, _, _, Jack, and, Jill _, _, _, Jack, and, Jill, went _, _, Jack, and, Jill, went, up _, Jack, and, Jill, went, up, the Jack, and, Jill, went, up, the, hill Listing 19.18: Example framing of the problem as sequences of words. This approach may allow the model to use the context of each line to help the model in those cases where a simple one-word-in-and-out model creates ambiguity. In this case, this comes at the cost of predicting words across lines, which might be fine for now if we are only interested in modeling and generating lines of text. Note that in this representation, we will require a padding of sequences to ensure they meet a fixed length input. This is a requirement when using Keras. First, we can create the sequences of integers, line-by-line by using the Tokenizer already fit on the source text. # create line-based sequences sequences = list() for line in data.split('\n'): encoded = tokenizer.texts_to_sequences([line])[0] for i in range(1, len(encoded)): sequence = encoded[:i+1] sequences.append(sequence) print('total Sequences: %d' % len(sequences)) Listing 19.19: Example of preparing sequences of words.

How to Develop Encoder-Decoder LSTMs

How to Develop Encoder-Decoder LSTMs Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder

More information

ˆ How to develop a naive LSTM network for a sequence prediction problem.

ˆ How to develop a naive LSTM network for a sequence prediction problem. Chapter 27 Understanding Stateful LSTM Recurrent Neural Networks A powerful and popular recurrent neural network is the long short-term model network or LSTM. It is widely used because the architecture

More information

ˆ The first architecture to try with specific advice on how to configure hyperparameters.

ˆ The first architecture to try with specific advice on how to configure hyperparameters. Chapter 14 Neural Models for Document Classification Text classification describes a general class of problems such as predicting the sentiment of tweets and movie reviews, as well as classifying email

More information

DEEP LEARNING IN PYTHON. Creating a keras model

DEEP LEARNING IN PYTHON. Creating a keras model DEEP LEARNING IN PYTHON Creating a keras model Model building steps Specify Architecture Compile Fit Predict Model specification In [1]: import numpy as np In [2]: from keras.layers import Dense In [3]:

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Practical Deep Learning

Practical Deep Learning Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 deep learning can seem mysterious 2 / 70 let's nd a way to just build a function 3 / 70 Feed Forward Layer # X.shape == (512,) # output.shape

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Application of Deep Learning Techniques in Satellite Telemetry Analysis. Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor

More information

Deep Nets with. Keras

Deep Nets with. Keras docs https://keras.io Deep Nets with Keras κέρας http://vem.quantumunlimited.org/the-gates-of-horn/ Professor Marie Roch These slides only cover enough to get started with feed-forward networks and do

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Machine Learning for Physicists Lecture 6 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Channels MxM image MxM image K K 3 channels conv 6 channels in any output channel, each pixel receives

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

PLT: Inception (cuz there are so many layers)

PLT: Inception (cuz there are so many layers) PLT: Inception (cuz there are so many layers) By: Andrew Aday, (aza2112) Amol Kapoor (ajk2227), Jonathan Zhang (jz2814) Proposal Abstract Overview of domain Purpose Language Outline Types Operators Syntax

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

EECS 496 Statistical Language Models. Winter 2018

EECS 496 Statistical Language Models. Winter 2018 EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning BARAK OSHRI and NISHITH KHANDWALA We present Emel, a new framework for training baseline supervised

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

Machine Learning Practice and Theory

Machine Learning Practice and Theory Machine Learning Practice and Theory Day 9 - Feature Extraction Govind Gopakumar IIT Kanpur 1 Prelude 2 Announcements Programming Tutorial on Ensemble methods, PCA up Lecture slides for usage of Neural

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex. Neural networks Spyros Samothrakis Research Fellow, IADS University of Essex About Linear function approximation with SGD From linear regression to neural networks Practical aspects February 28, 2017 Conclusion

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Generative Adversarial Text to Image Synthesis

Generative Adversarial Text to Image Synthesis Generative Adversarial Text to Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee Presented by: Jingyao Zhan Contents Introduction Related Work Method

More information

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan

More information

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm Instructions CNNs is a team project. The maximum size of a team

More information

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1 Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Real-time Gesture Pattern Classification with IMU Data

Real-time Gesture Pattern Classification with IMU Data Real-time Gesture Pattern Classification with IMU Data Alex Fu Stanford University Computer Science Department alexfu@stanford.edu Yangyang Yu Stanford University Electrical Engineering Department yyu10@stanford.edu

More information

Semantic text features from small world graphs

Semantic text features from small world graphs Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

VISION & LANGUAGE From Captions to Visual Concepts and Back

VISION & LANGUAGE From Captions to Visual Concepts and Back VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence

More information

DEEP LEARNING IN PYTHON. Introduction to deep learning

DEEP LEARNING IN PYTHON. Introduction to deep learning DEEP LEARNING IN PYTHON Introduction to deep learning Imagine you work for a bank You need to predict how many transactions each customer will make next year Example as seen by linear regression Age Bank

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison Natural Language Processing Basics Yingyu Liang University of Wisconsin-Madison Natural language Processing (NLP) The processing of the human languages by computers One of the oldest AI tasks One of the

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

DeepFace: Closing the Gap to Human-Level Performance in Face Verification DeepFace: Closing the Gap to Human-Level Performance in Face Verification Report on the paper Artem Komarichev February 7, 2016 Outline New alignment technique New DNN architecture New large dataset with

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information

Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning

Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Julien Perez 1 and Y-Lan Boureau 2 and Antoine Bordes 2 1 Naver Labs Europe, Grenoble, France 2 Facebook

More information

A Simple (?) Exercise: Predicting the Next Word

A Simple (?) Exercise: Predicting the Next Word CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Efficient Algorithms may not be those we think

Efficient Algorithms may not be those we think Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Stating the obvious, people and computers do not speak the same language.

Stating the obvious, people and computers do not speak the same language. 3.4 SYSTEM SOFTWARE 3.4.3 TRANSLATION SOFTWARE INTRODUCTION Stating the obvious, people and computers do not speak the same language. People have to write programs in order to instruct a computer what

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning, Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended

More information

A Deep Learning primer

A Deep Learning primer A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( ) Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Outlier detection using autoencoders

Outlier detection using autoencoders Outlier detection using autoencoders August 19, 2016 Author: Olga Lyudchik Supervisors: Dr. Jean-Roch Vlimant Dr. Maurizio Pierini CERN Non Member State Summer Student Report 2016 Abstract Outlier detection

More information

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Fuzzy Set Theory in Computer Vision: Example 3, Part II Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

RNNs as Directed Graphical Models

RNNs as Directed Graphical Models RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview

More information